Repository of online information sources: test domains for information extraction and wrapper generation tools that learn extraction rules (extraction patterns).
A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.